Paraphrasing Treebanks for Stochastic Realization Ranking

نویسندگان

  • Erik Velldal
  • Stephan Oepen
  • Dan Flickinger
چکیده

This paper1 describes a novel approach to the task of realization ranking, i.e. the choice among competing paraphrases for a given input semantics, as produced by a generation system. We also introduce a notion of symmetric treebanks, which we define as the combination of (a) a set of pairings of surface forms and associated semantics plus (b) the sets of alternative analyses for the surface form and sets of alternate realizations of the semantics. For inclusion of alternate analyses and realizations in the symmetric treebank, we propose to make the underlying linguistic theory explicit and operational, viz. in the form of a broad-coverage computational grammar. Extending earlier work on grammar-based treebanks in the Redwoods (Oepen et al. [13]) paradigm, we present a fully automated procedure to produce a symmetric treebank from existing resources. To evaluate the utility of an initial (albeit smallish) such ‘expanded’ treebank, we report on experimental results for training stochastic discriminative models for the realization ranking task. Our work is set within the context of a Norwegian–English machine translation project (LOGON; Oepen et al. [11]). The LOGON system builds on a relatively conventional semantic transfer architecture—based on Minimal Recursion Semantics (MRS; Copestake et al. [5])—and quite generally aims to combine a ‘deep’ linguistic backbone with stochastic processes for ambiguity management and improved robustness. In this paper we focus on the isolated subtask of ranking the output of the target language generator. For target language realization, LOGON uses the LinGO English Resource Grammar (ERG; Flickinger [6]) and LKB generator, a lexically-driven chart generator that accepts MRS-style input semantics (Carroll et al. [2]). Over a representative LOGON data set, the generator already produces an average of 45 English realizations per input MRS; see Figure 1 for an example. As we expect to move to

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Generate and Rank Approach to Sentence Paraphrasing

We present a method that paraphrases a given sentence by first generating candidate paraphrases and then ranking (or classifying) them. The candidates are generated by applying existing paraphrasing rules extracted from parallel corpora. The ranking component considers not only the overall quality of the rules that produced each candidate, but also the extent to which they preserve grammaticali...

متن کامل

A generalized super-efficiency model for ranking extreme efficient DMUs in stochastic DEA

In this current study a generalized super-efficiency model is first proposed for ranking extreme efficient decision making units (DMUs) in stochastic data envelopment analysis (DEA) and then, a deterministic (crisp) equivalent form of the stochastic generalized super-efficiency model is presented. It is shown that this deterministic model can be converted to a quadratic programming model. So fa...

متن کامل

Paraphrasing Adaptation for Web Search Ranking

Mismatch between queries and documents is a key issue for the web search task. In order to narrow down such mismatch, in this paper, we present an in-depth investigation on adapting a paraphrasing technique to web search from three aspects: a search-oriented paraphrasing model; an NDCG-based parameter optimization algorithm; an enhanced ranking model leveraging augmented features computed on pa...

متن کامل

Efficiency Evaluation and Ranking DMUs in the Presence of Interval Data with Stochastic Bounds

On account of the existence of uncertainty, DEA occasionally faces the situation of imprecise data, especially when a set of DMUs include missing data, ordinal data, interval data, stochastic data, or fuzzy data. Therefore, how to evaluate the efficiency of a set of DMUs in interval environments is a problem worth studying. In this paper, we discussed the new method for evaluation and ranking i...

متن کامل

Automatie Extraction of Stochastic Lexicalized Tree Grammars from Treebanks

We present a method for the extraction of stochastic lexicalized tree grammars (SLTG) of different complexities from existing treebanks, which allows us to analyze the relationship of a grammar automatically induced from a treebank wrt. its size, its complexity, and its predictive power on unseen data. Processing of different S-LTG is performed by a stochastic version of the two-step Early-base...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004